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Effect of Reformed Courses in Physics and Physical Science 
on Student Conceptual Understanding 



Kathleen Falconer, Mangala Joshua, Sue Wyckoff & Daiyo Sawada 



A Paper Presented at the 2001 Annual Conference of the American Educational Research Association in 
Seattle, WA. 




Introduct 




The use of active-engagement methods in teaching physics has been shown by Hake (1998) to 
significantly improve students' conceptual understanding of calculus and algebra based physics. The 
present paper extends Hake's (1998) methods of analysis to PHY 121, a calculus based physics course 
undergoing major changes in it modeling approach to learning, and to PHS 110, a large enrollment 
inquiry-based general studies physical science class open to all students, and taken by pre-service 
elementary education students. Here we report changes in students’ conceptual understanding of 
fundamental concepts in these introductory courses at Arizona State University (ASU) and several 
Maricopa Community Colleges in the Greater Phoenix area. 



During the past twenty years, science education researchers have been studying students’ conceptual 
understanding of science. This research indicates that at all levels of instruction students are not learning 
what teachers think they are teaching. This discrepancy seems to be due to a mismatch of how the teacher 
teaches versus how the students learn (McDermott, 1993, Driver, 1989; Tobin, Tippins, & Gallard, 1994; 
von Glasersfeld, 1987, 1989). Teaching by lecturing with students passively taking notes is very 
ineffective. Instead, a much more effective way to learn is for students to be actively involved in thinking 
and discussing during both class and laboratory, with the goal of having the students develop a deep 
understanding of scientific concepts. This goal is in accord with the educational practices advocated by 
the major professional science education communities (American Association for the Advancement of 
Science [AAAS] 1989, 1993; National Research Council [NRC] of the National Academy of Sciences, 
1996), 



This report is divided into two parts. Part One reports the studies done to assess the reforms in PHS 1 10; 
Part Two reports the studies done to assess the reforms in PHY 121. 





L part One ^ 



PHS 110 





The ASU PHS 1 10 class was reformed by Dr. Susan Wyckoff as part of the Arizona Collaborative for the 
Excellence in the Preparation of Teachers (ACEPT), a program funded by the National Science 
Foundation to reform key science and mathematics courses and curricula taken by students intending to 
become K-12 teachers. Dr. Wyckoff began with the conception that a reformed classroom was one where 
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students are interactively engaged as members of a learning community. In her view, students in a 
reformed class analyze evidence, reflect upon their learning, make observations and make predictions. 
There is active participation by the students with a variety of levels and paths for their investigations. 



Reforming a Large Enrollment Physical Science Course 



Prior to the ACEPT program, the ASU Physical Science (PHSllO) course was taught using traditional 
lecture methods, and the entire class (typically 80-100 students) met three times a week in a large lecture 
room. Breakout laboratory. sections associated with the course were scheduled at various two-hour time 
slots throughout the entire week. 

In fall of 1996 the major reforms introduced in the ASU PHSl 10 course included: 1) closely coordinating 
the lecture with the laboratory activities, 2) scheduling all laboratory sections to meet between the 
Monday and Wednesday lecture times, 3) converting both lecture and laboratories to a learning cycle 
model of pedagogy, 4) introducing take-home laboratories as homework, and 5) implementing ClassTalk 
technology (Dufresne et al. 1996) to facilitate active engagement in the lectures. 

With the reformed schedule for PHSl 10 students explored one or more new concepts on Tuesday of each 
week in small break-out laboratory sections (usually 10-15 students) before the concepts were discussed 
in the large class forum which continued to meet on Monday, Wednesday and Friday in the large lecture 
hall. The laboratories were open-ended and discovery-based, with the laboratory instructors (graduate 
teaching assistants) using Socratic-type questioning. The initial Tuesday laboratory explorations provided 
the basis for the large group interactions in the Wednesday lecture when new terms were introduced. 

ClassTalk is an electronic feedback system developed by Better Education which promotes group and all- 
class discourse in large enrollment environments. Each group of 3-4 students is equipped with a Texas 
Instruments, TI-85 calculator which is linked to with a Macintosh computer controlled by the Instructor. 
Multiple-choice conceptual questions can be displayed on the screen at the front of the classroom. The 
student groups then engage in discussion of the problem for several minutes, and each student sends 
his/her answer via the group's TI-85 calculator to the Macintosh which compares the answers submitted 
with the correct choice, performs the class’ statistics for that problem, and displays a histogram of the 
numbers of answers the class gave for each answer choice. Thus the students immediately (within about 
30 seconds) receive anonymous feedback on the correctness of their reasoning. At the same time the 
instructor receives an overview of the class' understanding of the concept addressed in the question. With 
the aid of ClassTalk the instructor can quickly change back and forth between student-centered groups 
engaging in discourse to a full-class discussion, and the technology is also a very effective classroom 
management tool for controlling discussions. Thus ClassTalk provides an effective and efficient means 
of transforming the classroom discourse into an active, open-ended inquiry session. The students are 
challenged by ClassTalk to generate solutions cooperatively, articulate their reasoning and defend their 
choices. Through discourse students' understanding of relationships and models can be clarified and 
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formalized. On Fridays and Mondays ClassTalk was used to review, consolidate and evaluate students' 
understanding of the one or two concepts introduced that week (in the Tuesday laboratories). The home- 
based (take-home) laboratory assignments extend scientific inquiry to the home environment where 
family members can participate with the student in simple experiments involving everyday phenomenon 
and inexpensive materials. 

instrumentation 

To evaluate students' conceptual understanding of physics concepts covered in the class, a traditional 
pretest-posttest quasi experimental design with control groups was employed. An instrument, the Physics 
Concept Survey (PCS), was developed to measure students’ learning by one of the authors (SW), and 
includes the most fundamental concepts in introductory physics. Also the instructors of the course were 
evaluated for reform teaching practice using an instrument called the “Reformed Teaching Observation 
Protocol” (RTOP). The RTOP was developed to measure the degree to which the teaching was in accord 
with the ACEPT program criteria for reformed teaching. The RTOP does not purport to measure whether 
the instruction was good, only if the instruction was reformed. 

Physics Concept Survey (PCS) 

Conceptual understanding of mechanics was measured using a fourteen item, multiple-choice test, which 
is the mechanics sub-test (MPCS) of the PCS. The 30 items in the PCS were designed to measure 
students’ conceptual understanding of one-dimensional Newtonian mechanics, momentum, gravitational 
force, energy, electricity and magnetism, and basic properties of light. Many of the mechanics items are 
based directly on the Force Concept Inventory (FCI) (Halloun &Hestenes, 1986; Hestenes et. al. 1992a, 
1992b). Other items were taken from Peer Instruction (Mazur 1996) and from Conceptual Physics 
(Hewitt 1999). The PCS was tested in the ASU PHSllO classes during the fall 1996 and spring 1997 
spring semesters, after which an item analysis was performed on the original test. Several PCS items with 
reverse discrimination indices were revised, and the new version of the PCS was re-administered to the 
PHSllO classes in fall 1997 through spring 1999 semesters at ASU. An item analysis of the PCS after 
revision produced KR20 reliability coefficients that ranged from 0.62 to 0.87 for the entire test and the 
sub-tests. The item analysis of just the MPCS gave a KR20 reliability coefficient of 0.80. The MPCS 
and PCS are administered via paper. The PCS is currently being prepared for publication. 

Reformed Teaching Observation Protocol (RTOP) 

The Reformed Teaching Observation Protocol (RTOP) (Pibum, M., Sawada, D., Turley, J., Falconer, K., 
Benford, R., Bloom, I., & Judson, E. 2000) was developed as an observation instrument to provide a 
standardized means for detecting the degree to which K-20 classroom instruction in mathematics or 
science is reformed. The developers, ACEPT's Evaluation Facilitation Group (EFG), did not presume that 
reformed instruction is necessarily quality instruction. Rather we left that as an hypothesis to be 
examined and tested in and across various reformed settings. The RTOP draws upon the principles of 
reform and the work underlying the ACEPT project including the "standards" in science and mathematics 
education [NCTM's Curriculum and Evaluation Standards (1989), Professional Teaching Standards 
(1991), Assessment Standards (1995) and NRC's National Science Standards (1996)]. RTOP consist of 
five sub-scales with a maximum of 20 points for each sub-scale for an overall total of 100. Despite the 
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fact that each sub-scale is based on just five items, reliabilities are still very respectable as shown in Table 
1. The reliability of the total score is 0.954. 

Table 1 

Reliability Estimates of RTOP Sub-scales and Total Score 



Name of Sub-scale 


R-Squared 


Sub-scale 1: Lesson Design and Implementation 


0.915 


Sub-scale 2: Content - Propositional Pedagogic Knowledge 


0.670 


Sub-scale 3: Content -Procedural Pedagogic Knowledge 


0.946 


Sub-scale 4: Classroom Culture - Communicative Interactions 


0.907 


Sub-scale 5: Classroom Culture - Student/Teacher Relationships 


0.872 


Total Score 


0.954 



Data Collection 

The data on the control and experimental groups were collected in fall 1998 (1 control, 1 experimental), spring 1999 
(1 control, 1 experimental) and fall 1999 (1 control, 1 experimental) semesters. The intervention was the ACEPT 
reform manner of teaching in the PHS 1 10 classes (experimental group). The control groups were selected 
introductory ASU physics classes and a community college class whose instructors had not been exposed to current 
reform teaching methods. The MPCS instrument was administered as a pretest-posttest for both the control and 
experimental courses. 

Fall 1998 Term 

In August 1998 an introductory physics course for pre-engineering students who had never taken physics previously 
was used as the control class. A very traditional lecturer taught the class; little or no student activity occurred in the 
class, and the laboratory was not related to lecture material. This course only covered mechanics, which was not 
apparent from the initial discussion with the professor about course content. However, because the professor of the 
control group did not want his students to be tested on material not covered in his class, the control group took only 
the MPCS as a posttest. The students in the experimental PHS 1 10 class took the entire 30-item test as a posttest. 
Both classes took the PCS as a pre-test. 

Spring 1999 Term 

In January 1999 the control class was a standard college conceptual physics class taken by non-science majors for 
laboratory science, liberal studies, credit. The professor teaching the course was considered a very good teacher and 
had won several teaching awards. Although this control course covered content similar to PHS 1 10, the professor of 
the control group decided that some of the material on the PCS was not adequately covered in his class. Therefore 
the control group took the MPCS as a posttest. The students in the experimental PHS 1 10 class wrote the entire 30- 
item PCS test as a posttest. 

Fall 1999 Term 

In August 1999 the control class was PHS 1 10 taught at a community college. The professor teaching the course 
was considered to be very good teacher. The control course covered most of the content of ASU's PHS 110, with the 
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addition of chemistry. The students in the control class wrote the entire 30-item PCS test as a pretest-posttest. The 
students in the experimental PHS 110 class took the entire 30-item PCS test as a pretest-posttest. 

Attrition Probiems 

The achievement data were analyzed using matched pairs for the pretests and posttests. Matched pairs were initially 
used because of attrition in the fall 1998 control group. The fall 1998 control group started with 58 students in two 
sections. By the end of the semester, there were only 16 to 20 students regularly attending the control class. Use of 
matched pairs assured that the same students took both the pretest and posttest. Attrition was not a factor in the 
other classes. The fall 1998 PHS 110 started with 67 students in one section. By the end of the semester, 50-58 
students were regularly attending PHS 110 class. The numbers were similar for the spring and fall 1999 semester, 
PHS 110, class. The spring 1999 control class had 250 students in two sections. There were 70-100 students 
attending each class at the end of the semester. The fall 1999 control class had 23students attending out of 25 
attending at the end of the semester. 

RTOP 

In all three terms, all classes were observed at least two times to gather RTOP data. A pair of observers usually did 
the observations, one trained in mathematics, the other in Physics. 



Data Analysis 

Data were collected from three experimental classes and three control classes over three instructional terms. Each 
class was administered a content pretest and posttest. As well, each class was observed two or more times to 
collected RTOP scores. A pretest mean, a posttest mean, and an RTOP mean were calculated for each class. The 
significance for the pre-test and post- test scores per class and gains were calculated using a student t-test. The 
analysis was organized by term. 

Results 

RTOP Profiles 

Before examining the results of the quasi-experimental comparisons it is important to establish that, even 
though many reforms were implemented in PHS 1 10, did these enactments actually result in classrooms, 
which were reformed? The Reformed Teaching Observation Protocol (RTOP) was used to observe the 
teaching all of the sections of PHS 1 10 resulting in RTOP scores ranging between 66-79 for the 
experimental classes and scores ranging from 16-43 for the control classes. As summarized in Table 2, all 
RTOP sub-scales revealed significant differences between the experimental and control groups at the 0.01 
level of significance. Only on Sub-scale 2 do you see scores that are close. This makes sense since Sub- 
scale 2 focuses on “content” whereas the other scales focus on lesson design, classroom procedures, 
communicative interactions, and teacher-student relationships. These results suggest that the lessons of 
the control instructors are strongly coherent with respect to content but are lacking in the other 
dimensions of reform. Figure 1 summarizes the RTOP comparisons in graphic form. 

Table 2 

Differences Between Means on the Sub-scales of RTOP 
For PHS 1 10 and Control Groups 
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Variable 


PHS 110 Means 
(n = 6) 


Control Means 
(n = 6) 


Difference 


Significance of the 
Difference 


Sub-scale 1 


13.17 


4.5 


8.67 


P<0.01 


Sub-scale 2 


17 


14.17 


2.83 


P<0.01 


Sub-scale 3 


14 


2.5 


11.5 


P<0.01 


Sub-scale 4 


15.33 


5.5 


9.83 


P<0.01 


Sub-scale 5 


13.83 


5.5 


8.33 


P<0.01 


Total Score 


73.33 


32.17 


41.17 


P<0.01 



RTOP Comparisons Between Experimental and 
Control Classes 




Figure 1. RTOP Comparisons between PHS 1 10 and control classes 

The RTOP analysis verifies that the PHS 1 10 classes are substantially reformed. The question of whether 
such reforms make a difference in terms of conceptual gains is answered in the next section. 

Conceptual Gains 



For the Fall 1998 and Spring 1999, the control classes scored significantly higher than the experimental 
classes on the MFCS pretests, however the PHS 1 10 students performed significantly better than the 
control group on the MFCS posttests. Thus even though the reform PHS 1 10 class started with lower 
initial average MFCS scores, the average MFCS posttest scores were significantly higher than the control 
groups. These differences on the posttest are particularly interesting because control fall 1998 and control 
spring 1999 spent more time in the semester working on mechanics than did the experimental. In fact, the 
fall 1998 control spent the entire semester working on mechanics. Even though these controls had spent 
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more time, their posttest scores were less than, the PHS 110 students who spent approximately five weeks 
on mechanics. 

For the fall 1999, the control class scored higher the experimental class on the MFCS pretests, but not 
statically significantly higher. However, again on the posttest, the PHS 110 class had an average MPCS 
posttest scores which were significantly higher than the control group. 

Table 3 

Means and Standard Deviations of the Experimental and Control Groups on MPSC 



Fall 98 Spring 98 Fall 99 



Experimental 


Control 


Experimental 


Control 


Experimental 


Control 


M 


SD 


M 


SD 


M 


SD 


M 


SD 


M 


SD 


M SD 




*Pre 


2.3 


7.0 


3.4 


5.3 


2.3 


6.1 


2.2 


4.7 


2.5 


6.2 2.1 


♦Post 9.1 


2.5 


7.6 


2.7 


9.5 


2.7 


7.8 


2.3 


9.8 


2.2 


7.9 2.4 



*Maximum Score is 14 



The gain score we are using is the Hake gain from R. Hake (1998). For given set of students it is defined 
formulaically as (mean posttest score -mean pretest score)/ (maximum test score - mean pretest score). 
The Hake gain or Hake Factor is usually associated with the Force Concept Inventory (FCI) (Hestenes, 
Wells & Swackhamer, 1992) The Hake gain attempts to adjust for differing initial student competence. 
The experimental classes demonstrated Hake gains at least twice those of the control groups over the 
three semesters studied. The difference in Hake gains between the experimental and the control courses 
is statistically significant (p<0.01). 

Table 4 

Hake Gains of Experimental and Control Groups on MPSC 



Fall 98 


Spring 98 


Fall 99 


Experimental 


Control 


Experimental Control Experimental 


Control 


H 


H 


H H H 


H 



Hake Gain* *' 0.45 0.09 0.48 0.21 0.54 0.22 



* Hake Gain is a nonnalized gain. Calculated for a class by (avg. post score-avg. pre score )/(max. score possible - avg. pre score) 

* The Hake Gain for the Experimental Groups was significantly greater than the Hake Gain for the Control Groups. (p<0.01) 
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Relationship between RTOP and Conceptual Gains 



The control groups normalized gains were significantly lower than the experimental groups (p<0.01), on 
the MPCSs. The RTOP scores for the control instructors ranged from 16 to 43. All of the control classes 
had average RTOP scores less than 40. For the experimental classes at ASU, the two instructors had 
similar average RTOP scores with the same instructor having taught both fall semesters. The RTOP 
scores ranged from 66 to 79, which indicated good reform practice in the courses. The correlation 
coefficient between RTOP and the normalized gain on the MPCS was 0.98. This would indicate that 
reform teaching, interactive inquiry-oriented instruction, results in greater student conceptual instruction, 
results in greater student conceptual 



Average RTOP Score and Normalized Gain on Mechanics Subtest of PCS 

0 . 8 - 



Normalized 
Gain Score 
on 

Mechanics 

Subtest 

and 

Avg. 

RTOP 

Score 

(normalized to 1) 




Control Fall98 Control Spi99 Control Fall99 



Exp. Fall98 gp^ 



Exp. Fall99 

The correlation coefficient between RTOP and normalized gain on the mechanics subtest of PCS is 0.98. 



Figure 2. Average RTOP Score and Normalized Gain (Hake Gain) on the Mechanics Subtlest of 
the Physics Concepts Survey for Experimental and Control classes Fall 1998-Falll999 

Summary and Conclusions 




10 



p 8 0/1 9 




We conclude that students in the ACEPT reformed PHSl 10 course learned significantly more physics concepts, than 
students in the traditional lecture classes (control groups). Our work therefore extends that of Hake (1998) from 
science students in interactive engagement physics classes to physics classes for non-science students. The 
significance of our investigation is that non-science students demonstrate significantly greater learning of 
fundamental physics concepts in active-engagement classroom environments compared with traditional lecture 
classes. Furthermore, our data show that significant learning can take place in large-enrollment classrooms managed 
by efficient electronic feedback systems such as ClassTalk. Our results have important bearings on effective 
teaching of undergraduate general studies laboratory physics courses taken by pre-service elementary education 
students. Clearly, a well-designed physics or physical science course for elementary education students which 
incorporates results from studies such as found here has the potential of significantly improving elementary 
teachers’ understanding of physics concepts, and therefore their abilities to teach the subject well. 

Our work addresses a possible concern with Hake (1998) the wide spread in the normalized gains for the 
Interactive Engagement courses. Hake indicated there was ’’the presence of implementation 
problems.”(70 Hake 1998) However, the level of interactive engagement (reform) was entirely self 
reported. The RTOP as a measure of reform, removes the self-reporting form the level of reform. The 
high correlation with the RTOP and normalized gains would support Hake’s hypothesis of implementation 
problems. 
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The Modeling Method is both a philosophy of teaching and a curriculum for physics education. David 
Hestenes developed the philosophy. In 1983, David Hestenes had drafted a paper on physics pedagogy 



Wells, with Prof. David Hestenes (Hestenes, Wells & Swackhammer, 1995) The Modeling Method was 
developed to correct what Hestenes, and others saw as weaknesses of the traditional lecture- 
demonstration method. These weakness were quantified by two multiple choice, conceptual assessments 
of student's mechanics understanding. The Mechanics Baseline Test (Hestenes & Wells, 1992) and The 
Force Concept Inventory (Hestenes, Wells & Swackhamer,1992). The National Science Foundation 
supplied funding for the development of assessments and workshops for high school teachers. The high 
school teachers in the second year of the workshop helped to modify and expand the curriculum. It is this 
expanded and modified curriculum, which is referred to as the Modeling Method and can be found at 
<http://modeling.la.asu.edu> as well as the educational research program directed by Prof David 
Hestenes (Hestenes, 1997). 

Modeling Method - Curriculum and Philosophy 

The Modeling Method seeks to address perceived weaknesses of the traditional lecture-demonstration 
method including the fragmentation of knowledge, student passivity, and the persistence of naive beliefs 
about the physical world. In modeling the students are engaged in understanding the physical world by 
constructing and using scientific models to describe, to explain, to predict and to control physical 
phenomena. Having the students learn and use a variety of conceptual tools, including Microcomputer 
Based Laboratories (MBLs), Calculator Based Laboratories (CBLs), white boards, etc facilitates student 
understanding. Students use these tools along with a small set of basic models to develop insight into the 
structure of scientific knowledge by examining how models fit into theories. The students evaluate the 
scientific models through comparison to empirical data they have collected. For example, students start 
out in mechanics by modeling an object as a single particle, but the single particle model breaks down 
when you have to take into account internal energy. The students realize the single particle model doesn’t 
work by collecting and analyzing data from a bouncing ball. Then the extended body model is introduced. 
The students are forced to make predictions and confront their naive beliefs about the physical world. 

The Modeling Method of instruction is organized into modeling cycles: model development, evaluation 
and application. The teacher sets the stage for student activities, typically with a demonstration and class 
discussion to establish common understanding of a question to be asked of nature. Then, in small groups, 
students collaborate in planning and conducting experiments to answer or clarify the question. After the 
students have conducted the experiments, they are required to present and justify their conclusions in oral 
and/or written form to the group. Particular attention is paid to the formulation of models for the 
phenomena in question and evaluation of the models by comparison with data. Only after the students 



using the ideas of scientific models for teachers to use to teach physics. The curriculum part of the 
Modeling Method was originally developed and enacted by a high school physics teacher, Malcolm 



ERIC 






p 1 Oo/I 9 



have experienced the phenomena and have grappled with the concepts, does the teacher introduce 
technical terms and concepts. The Modeling curriculum has a definite agenda for student progress and a 
taxonomy of typical student misconceptions. These help facilitate the teacher's guidance of student 
inquiry and discussion as students are induced to articulate, analyze and justify their personal beliefs. The 
teachers use "Socratic" questioning and remarks to address the students' naive conceptions. The Modeling 
Method of instruction promotes an integrated understanding of the process of developing, constructing 
and assessing scientific models. 

Discourse Practice: Re-Modeling University Physics 

The modeling approach at Arizona State University has undergone subtle yet major transformation in the 
past three years as embodied for example in the ReModeling Summer Workshop 1999. Led by Dwain 
Desbien, a doctoral student whose own teaching had generated the highest Hake gains to date, the intent 
of the workshop was to help participants experience and understand how critically important the 
formation of discourse communities was in the modeling classroom. In bringing the workshop 
participants into a inquiry-community in its own right, the workshop leader enabled them to appreciate 
how students could be guided to constitute inquiry communities participating in interactive 
communication that not only led to the critique and creation of basic models of physics but in doing so 
made the community they constituted aware if its own practice (see Sawada, 1999 for more detail). 

The workshop had major impact on most of its participants and set the conditions for assessing the 
remodeling research reported here. Some of the participants in the summer workshop were scheduled to 
be instructors for PHY 121in the fall term making it possible to set up a quasi-experimental design for 
assessing the effects of the new discourse-oriented approach to modeling. 

Instrumentation 

To evaluate students' conceptual understanding of physics concepts covered in the classes, a traditional 
pretest-posttest quasi experimental design with control groups was employed. The Force Concept 
Inventory (FCI) was used. The FCI was originally developed to measure conceptual gains in mechanics 
for Modeling Method instruction, but is now used by many Physics educators to measure conceptual 
gains in student understanding (Hake, 1998). The instructors of the courses were evaluated for reform 
teaching practice using an instrument called the “Reformed Teaching Observation Protocol” (RTOP). 
Since the Modeling Method had become associated with ACEPT, Modeling Method was assumed to meet 
the ACEPT program criteria for reformed teaching. 

Force Concept Inventory (FCI) 

Conceptual understanding of mechanics was measured using a thirty item, multiple-choice test. The 30 
items in the FCI were designed to measure students’ conceptual understanding of Newtonian mechanics 
and forces. (Halloun &Hestenes, 1985; Hestenes et, al, 1992a, 1992b). A critic of the FCI was published 
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by Heller where she refutes some of Halloun & Hestenes claims but the FCI is still the standard test used 
by Physics Education Research (PER) in mechanics. 

Reformed Teaching Observation Protocoi (RTOP) 

The Reformed Teaching Observation Protocol (RTOP) (Pibum, M., Sawada, D„ Turley, J„ Falconer, K„ 
Benford, R., Bloom, I., & Judson, E. 2000) was developed as an observation instrument to provide a 
standardized means for detecting the degree to which K-20 classroom instruction in mathematics or 
science is reformed. Please see a fuller description in Part I. 



Data Collection 



In order to assess whether the remodeled approach could lead to high achievement in PHY 121, introductory 
mechanics with calculus; a pretest-posttest quasi experimental design was arranged. The data on the control and 
experimental groups were collected in fall 1999 (with 3 experimental classes and one control class). There were 
originally two control classes, but the instructor of the second control class after an initial acceptance, refused to 
allow the post test administration of the FCI. The intervention was the Modeling Method of reform teaching in the 
PHY 121 classes (experimental group). The control groups were the other two ASU PHY 121 classes whose 
instructors had not been exposed to current reform teaching methods. The FCI instrument was administered as a 
pretest-posttest for both the control and experimental courses. 

Fall 1999 Term 

• Experimental Class 1: A PHY 121 course offered at a Maricopa Community College taught by the 
leader of the Modeling Summer Workshop. , 

• Experimental Class 2: A PHY 121 course offered in “studio” mode as part an integrated approach to 
preparing engineers taught by an instructional team consisting of the leader of the Modeling Summer 
Workshop, a Professor of Physics and a graduate student both of whom had attended the Summer 
Workshop. 

• Experimental Class 3: An “honors” section of PHY 121 at ASU taught by a co-teaching colleague of 
Dwain Desbien. The two Teaching Assistants had attended the Modeling Summer Workshop. 

• Control Class: A “regular” large lecture section of PHY 121 at ASU. The instructor had not attended the 
Modeling Summer Workshop. 

Attrition Problems 

The achievement data were analyzed using matched pairs for the pretests and posttests. The pre-test and posttest 
means were calculated for each class, except for the second control class which did not participate. There was no 
statically significant difference between the pre-tests means for the two control classes. 

RTOP 

All classes were observed at least two times to gather RTOP data. The RTOP instrument was administered a 
minimum of 3 and a maximum of 7 times by a pair of observers, one trained in mathematics, the other in 
Physics. 
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Data Analysis 



Data were collected from three experimental classes and one control classes over the instructional term. Each class 
was administered a content pretest and posttest. As well, each class was observed two or more times to collected 
RTOP scores. A pretest mean, a posttest mean, and an RTOP mean were calculated for each class. 



Results 



RTOP Profiles 

Before examining the results of the quasi-experimental comparisons it is important to establish that, even 
though many reforms were implemented in PHY 121, did these enactments actually result in classrooms, 
which were reformed? The Reformed Teaching Observation Protocol (RTOP) was used to observe the 
teaching all of the experimental classes of PHY 121 resulting in RTOP scores ranging between 78-99 for 
the experimental classes and scores ranging from 17-39 for the control classes. The second control group 
was included in the sub-scale analysis since the question was are these classes reform, per our definition 
via RTOP. As summarized in Figure 3, all RTOP sub-scales revealed significant differences between the 
experimental and control groups at the 0.01 level of significance. Only on Sub-scale 2 do you see scores, 
which are close. This makes sense since Sub-scale 2 focuses on “content” whereas the other scales focus 
on lesson design, pedagogy, discourse and teacher-student relationships. These results suggest that the 
lessons of the control instructors are strongly coherent with respect to content but are lacking in the other 
dimensions of reform. . A total of 19 RTOP observations were made by two different observers of the 
four classes participating in the study. As shown in Table 5, the mean. RTOP for the experimental classes 
are all substantially higher than the mean for the control class (p < .01). 
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RTOP Comparisons Between Experimental and Control 

Classes 




Figure 3. RTOP Comparisons between PHY 121 and control classes 



Table 5 

RTOP Mean Scores for Each Class 



Class 


No. of RTOP Administrations 


Mean RTOP Score 


Experimental 1 


3 


98.5 


Experimental 2 


7 


79.7 


Experimental 3 


4 


85.7 


Control 


5 


27.2 
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Conceptual Gains 



The Force Concept Inventory (FCI) was administered early in the semester as a pretest and again toward 
the end of the semester as a posttest. As shown in Table 6, the difference between the mean pre-test and 
post-test scores for Experimental class 3 is only marginally higher than the gain for the Control class (5.6 
vs. 5.0). This unexpected outcome could be due to the presence of a ‘ceiling” effect: Experimental 3 is an 
honors class which had a pretest mean that is much higher than the control pretest mean. The Normalized 
Gain score is precisely the kind of score, which attenuates for ceiling effects. As shown in Table 6, when 
Normalized Gain scores are considered, Experimental 3 scored higher (0.51) than the control class (0.32). 
Indeed, the Normalized Gain score for each of the experimental groups was significantly higher than the 
Normalized Gain for the control class (p < .01). 



Table 6 

Pretest - Posttest Comparisons of Experimental and Control Classes on the FCI 



FCI 


Experimental 1 


Experimental 2 


Experimental 3 


Control 


Pretest Mean 


13.9 


13.7 


19.1 


14.3 


Posttest Mean 


23.6 


21.3 


24.7 


19.3 


Hake Gain* 


0.60 


0.47 


0.51 


0.32 



* Normalized Gain = Gain / (Max score - Pretest mean). Referred to as the “Hake” factor 



Relationship between RTOP and Conceptual Gains 



The control group's normalized gain was significantly lower than the experimental groups (p<0.01), on 
the FCI. The RTOP scores for the control instructor was a mean of 27.2. For the experimental classes, the 
RTOP scores ranged from 77 to 99 that indicated good reform practice in the courses. The mean RTOP 
scores were 98.5, 79.7 and 85.7. The correlation coefficient between RTOP and the normalized gain on 
the FCI was 0.97. This would indicate that reform teaching,, interactive inquiry-oriented instruction, 
results in greater student conceptual understanding. 
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Average RTOP Score and Normalized Gain on Force Concept 

Inventory 



Normalized 
Gain Score 
on 
FCI 
and 



Avg. 

RTOP 

Score 

(normalized to 1) 




The correlation coefficient between RTOP and normalized gain on the FCI is 
0.97. 



Figure 4. Normalized Gain on the Force Concept Inventory v.s RTOP for Experimental 

and Control Classes. 



Conclusion 



The results indicate that the remodeling method as used in the three experimental classes outperformed 
the control class on the Force Concept Inventory. Furthermore, the three experimental classes are much 
more reformed than the control class as indicated by data on the Reformed Teaching Observation 
Protocol. 

Hake used the term interactive engagement to mean something very similar to our version of reform. The 
gain scores when compared to Hake's histogram of the average normalized gain, would indicate the 
experimental gain scores were comparable to the middle to high range of gains for interactive 
engagement. Interestingly, the control gain of 0.32 was the highest of the reported traditional classes. 
(Hake, 66,1998). However, Hake makes the claim that several of the lower normalized gain scores 
should not be considered as interactive engagement because the implementation was not correct. Hake 
made these comments based on others understanding of the classrooms as not being interactive 
engagement when they had self-reported as interactive engagement. This is the advantage of using an 
instrument for classroom observations like the RTOP, which could have been used to measure the 
difference in reform in the classrooms and helped answer whether or not interactive engagement works in 
the physics classroom. 
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The results shown in Part One and Part Two of this paper reveal significant achievement gains in favor of 
the experimental classes for both physics and physical science courses. It is safe to conclude that the 
reforms embodied in PHY 121 and PHS 1 10 result in both greater gains in student achievement (FCI) and 
higher scores on classroom reform (RTOP). The quasi-experimental designs offer support for the claim 
that the greater achievement gains are attributable to the reformed nature of the instruction in the 
experimental groups. However, it must be admitted that quasi-experimental designs do have many threats 
to validity not the least of which is the non-randomness of assignment of subjects to treatments. For these 
and other reasons, we do not claim that reformed instruction causes higher achievement gains. 

However, the following correlation data offer a striking portrayal of the relationship between reformed 
teaching and achievement gains. Because Normalized Gain scores are a pure number, the data from PHY 
121 and PHS 1 10 can be combined. The RTOP score and the Normalized Gain score for each of these ten 
classes is shown in Figure 5. The relationship between degree of reform and amount of normalized gain 
is striking: the two measures appear to rise and fall in lock-step fashion. 



Average RTOP Score and Normalized Gain for Phs 110 and PHY 121 



Normalized 
Gain Score 
and 



Avg. 

RTOP 

Score 

(normalized to 1) 




The correlation coefficient between RTOP and normalized gain is 0.94. 



Figure 5. The Relationship Between Degree of Reformed Teaching and Student Learning 

The correlation between RTOP and Normalized Gain for the 10 data points shown in Figure 5 is 0.94. 
Although the sample size is small a correlation of this magnitude is significant at the .01 level. 

We conclude that a very strong relationship exists between reformed teaching and student learning. 
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